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Abstract 

Consider a defined density on a set of very large dimension. It is quite difficult to find an estimate 
of this density from a data set. However, it is possible through a projection pursuit methodology 
to solve this problem. In his seminal article, Huber (see "Projection pursuit". Annals of Statistics, 
1985) demonstrates the interest of his method in a very simple given case. He considers the 
factorization of density through a Gaussian component and some residual density. Huber's work 
is based on maximizing relative entropy. Our proposal leads to a new algorithm. Furthermore, we 
will also consider the case when the density to be factorized is estimated from an i.i.d. sample. 
We will then propose a test for the factorization of the estimated density. Applications include a 
new test of fit pertaining to the Elliptical copulas. 
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1. Outline of the article 

The objective of Projection Pursuit is to generate one or several projections providing as 
much information as possible about the structure of the data set regardless of its size: 

Once a structure has been isolated, the corresponding data are eliminated from the data set. 
Through a recursive approach, this process is iterated to find another structure in the remaining 
dat a, until no f uther structure c an b e evidenced in the data left at the end. 

FriedmanI (1 1984 and I987h and 'Hubed (Il985l) count among the first authors to have intro- 



duced this type of approaches for evidencing structures. They each describe, with many exam- 
ples, how to evidence such a structure and consequently how to estimate the density of such data 
through two different methodologies each. Their work is based on maximizing relative entropy. 
For a very long tim e, the two methodologies exposed by each of the above authors were thought 
to be equivalent but lMu Zhu (|2004 showed it was in fact not the case when the number of iter- 



ations in the algorithms exceeds the dimension of the space containing the data. In the present 
article, we will therefore only focus on Huber's study while taking into account Mu Zhu remarks. 

At present, let us briefly introduce Huber's methodology. We will then expose our approach 
and objective. 
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1.1. Huher's analytic approach 

Let / be a density on W^. We define an instrumental density g with same mean and variance 
as /. Huber's methodology requires us to start with performing the K(f,g) = test - with K 
being the relative entropy. Should this test turn out to be positive, then f - g and the algorithm 
stops. If the test were not to be verified, the first step of Huber's algorithm amounts to defining a 
vector ai and a density by 

ai = arg inf K{f^,g) and/'^ = (1.1) 

aeK? Ta Jai 

where is the set of non null vectors of M.^, where /„ (resp. ga) stands for the density of a^X 
(resp. fl^y) when / (resp. g) is the density of X (resp. Y). More exactly, this results from 
the maximisation of a K(fa,ga) since K(f,g) - K(fa,ga) + K{fj-,g) and it is assumed that 
K{f, g) is finite. In a second step, Huber replaces / with /^'^ and goes through the first step 
again. 

By iterating this process, Huber thus obtains a sequence {auui, ...) of vectors of and a se- 
quence of densities /*'\ 

'Remark 1.1. Huber stops his algorithm when the relative entropy equals zero or when his 

algorithm reaches the d"' iteration, he then obtains an approximation of f from g : 

When there exists an integer j such that K(f^^"',g) = with j < d, he obtains /^•'^ = g, i.e. 

. jii-i) 

f = ^n^^j-j— since by induction f'-'^ = fllj^^^^. Similarly, when, for all j, Huber gets 

£-1) 

K(f^^^,g) > with j < d, he assumes g = f ''^ in order to derive f = g^'l^i — • 

He can also stop his algorithm when the relative entropy equals zero without the condition j < d 

is met. Therefore, since by induction we have /^^' = flV'.^^j^ with = /, we obtain g = 

■ ft, ' ■ y^' " 

/n^'^j jfTT)- Consequently, we derive a representation of f as f — ^H^-j-j— • 
Finally he obtains Kif^^*\g) > Kif^^\g) > > with = /. 

1.2. Huber's synthetic approach 

Keeping the notations of the above section, we start with performing the K(f, g) = test; 
should this test turn out to be positive, then f - g and the algorithm stops, otherwise, the first 
step of his algorithm would consist in defining a vector ai and a density g^^^ by 

ai = arg inf K(f,g^) and - g^. (1.2) 

aeUi ga gai 

More exactly, this optimisation results from the maximisation of a K(fa,ga) since K(f,g) = 
K(fa,ga) + K(f,gj-) and it is assumed that K(f,g) is finite. In a second step, Huber replaces g 
with g'-^^ and goes through the first step again. By iterating this process, Huber thus obtains a 
sequence (ai, a2, — ) of vectors of and a sequence of densities g^'\ 

'Remark 1.2. First, in a similar manner to the analytic approach, this methodology enables us 

to approximate and even to represent f from g: 

To obtain an approximation of f, Huber either stops his algorithm when the relative entropy 
equals zero, i.e. K(f, ^^■'^) = implies g^^ = f with j < d, or when his algorithm reaches the d"' 
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iteration, i.e. he approximates f with g^'^K 

To obtain a representation of f, Huber stops his algorithm when the relative entropy equals zero, 
since K(f, g^-'^) - implies g^J^ - f. Therefore, since by induction we have g^-'^ - gH^^j with 

g^"' = g, we then obtain f - gllj'^j ^1^. 

Second, he gets K(J,g'-^^^) > K{f,g(^^) > > with g*"' = g. 

1.3. Proposal 

Let us first introduce the concept of O-divergence. 
Let ^ be a strictly convex function defined by : M+ — > M+, and such that ip(\) - 0. We define 
a O-divergence of P from Q - where P and Q are two probability distributions over a space Q 
such that Q is absolutely continuous with respect to P - by 



Throughout this article, we will also assume that ^(0) < oo, that if' is continuous and that this 
divergence is greater than the L' distance - see also Annex lATI pagefTS] 

Now, let us introduce our algorithm. We start with performing the 0(g, /) = test; should this 
test turn out to be positive, then f - g and the algorithm stops, otherwise, the first step of our 
algorithm would consist in defining a vector a\ and a density g*^'' by 

fli = arg inf <^(g^,f) ^ndg^'^^g^^. (L3) 

aeVii ga gen 

Later on, we will prove that ai simultaneously optimises ( ILII ). ( IL2b and ( ll.3l l. 

In our second step, we will replace g with g*'*, and we will repeat the first step. 

And so on, by iterating this process, we will end up obtaining a sequence (a\,a2, ■■■) of vectors 

in Wl and a sequence of densities g^'\ We will thus prove that the underlying structures of / 

evidenced through this method are identical to the ones obtained through the Huber's method. 

We will also evidence the above structures, which will enable us to infer more information on / 

- see example below. 

"Remark 1.3. As in the previous algorithm, we first provide an approximate and even a repre- 
sention of f from g: 

To obtain an approximation of f, we stop our algorithm when the divergence equals zero, i.e. 
<l>(g^-'\f) = implies g^J^ — f with j < d, or when our algorithm reaches the d''^ iteration, i.e. 
we approximate f with g^'^K 

To obtain a representation of f, we stop our algorithm when the divergence equals zero. There- 
fore, since by induction we have g^^^ = 8^i=i ^^^^ 8^'^^ ~ 8> then obtain f — g^\^y ■ 
Second, he gets OCg*"*,/) > <l>(g'-^\f) > > with = g. 

Finally, the specific form of relationship ( li.il ) establishes that we d eal wi t h M-e s timation. We 
can there fore state that our method is more robust than Huber's - see \Yohah tOO^ . \Toma\ tooj) 



as well as \HubenA2004) 



At present, let us study two examples; 
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Sxample 1.1. Let f be a density defined on M? by f{x\,X2, X3) — n{xi,X2)h{x-i), with n being a 
bi-dimensional Gaussian density, and h being a non Gaussian density. Let us also consider g, a 
Gaussian density with same mean and variance as f. 

Since g{xi,X2/x2) — n(xi,X2), we then have ^ig^,f) — (S>(n.f2,f) = 0{f,f) — Q as - h, i.e. 

the function a 1— > ^(gj-'f) reaches zero for ej, = (0, 0, 1)'. 
We therefore obtain g(xi,X2/x3) — fix\,X2l x^). 

Sxample 1.2. Assuming that the ^-divergence is greater than the L^ norm. Let us consider 
(X„)„>o, the Markov chain with continuous state space E. Let f be the density of(Xo, X\) and let 
g be the normal density with same mean and variance as f. 

Let us now assume that ^{g^^\ /) = with g^^\x) - g{x)j-^, i.e. let us assume that our algorithm 
stops for fli = (1,0)'. Consequently, if {Yq,Y\) is a random vector with g density, then the 
distribution law ofX\ given Xq is Gaussian and is equal to the distribution law ofY\ given Yq. 
And then, for any sequence (A,) - where Ai <Z E - we have 
P[x„+i eA„+i \XoeAo,Xi eAi,...,X„_i €A„_i,X„eA„) 

= P (X„+i e A„+i I X„ G A„) , based on the very definition of a Markov chain, 

— P (Xi € Ai \ Xq € Aq) , through the Markov property, 

— P (Fi € Ai \ Yo e Aq) , as a consequence of the above nullity of the <i>-divergence. 

To recapitulate our method, if ^{g, f) - 0, we derive / from the relationship f - g', should 
a sequence {ai)i=\ , j, j < d, of vectors in defining g^J^ and such that = exist, then 

f{./ajx, 1 < i < j) = g{./ajx, 1 < i < j), i.e. / coincides with g on the complement of the vector 
subspace generated by the family {a,},=i,...j - see also section 2 for a more detailed explanation. 

In this paper, after having clarified the choice of g, we will consider the statistical solution 
to the representation problem, assuming that / is unknown and Xi, X2,... X,„ are i.i.d. with 
density /. We will provide asymptotic results pertaining to the family of optimizing vectors ak,m 
- that we will define more precisely below - as m goes to infinity. Our results also prove that 
the empirical representation scheme converges towards the theoretical one. As an application, 
section [34l permits a new test of fit pertaining to the copula of an unknown density /, section 
13.51 gives us an estimate of a density deconvoluted with a Gaussian component and section [3761 
presents some applications to the regression analysis. Finally, we will present simulations. 



2. The algorithm 



2.7. The model 

As explained by Friedman! ( 1984 and 1987 ) and Diaconis ( 1984 ), the choice of g depends on 
the family of distribution one wants to find in /. Until now, the choice has only been to use the 
class of Gaussian distributions. This can be extended to the class of elliptic distributions with 
almost all O-divergences. 



2.LL Elliptical laws 

The interest of this class hes in the f a ct that conditional densities with elliptical distributions 
are also elliptical - 



CambanisI d 1 98 Ih . iLandsmanl (l2003h . This very property allows us to use 



this class in our algorithm. 



Definition 2.1. X is said to abide by a multivariate elliptical distribution - noted X ~ E^ip, £, ^d) 
- ifX presents the following density, for any x in M'' .• 
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• with 2, being a d X d positive-definite matrix and with ju, being an d-column vector, 

• with ^d, being referred as the "density generator", 

• with Cd, being a normalisation constant, such that Cd - ^^^( x^^^^^^d(x)dx^ , 



with Jj'j°° x''^^ ^^d(x)dx < oo. 



Property 2.1. 1/ For any X ~ Edi/u, 2, ^d), for any A, being a mX d matrix with rank m < d, and 
for any b, being an m-dimensional vector, we have AX + b ~ E„,{Aii + b, AYA' , ^,„). 
Therefore, any marginal density of multivarite elliptical distribution is elliptic, i.e. 
X = {XuX2, ..., Xd)~ Ed(u,I. (d) ^ Xi ~ Ei(pi,crl^i), fx,(x) = I < i < d. 



2/ Corollary 5 o nCambania nl981\] states that conditional densities with elliptical distributions 
are also elliptic. Indeed, ifX — (Xi,X2y ~ Ed(fi,^,^d), with Xi (resp. X2) being a size d\ < d 
(resp. c/2 < d), then Xi/(X2 - a) ~ Ed,(p' ,^d,) with fj.' - p.\ + 1.12^21(0 - ^2) and U — 
Zii - 2122:222:21, withn = (p\,H2) andl. = (^ij)\<ij<2- 



"Remark 2.1. \Landsman i\2003\} shows that multivariate Gaussian distributions derive from 
^d(x) — e^". He also shows that ifX — (Xi , Xd) has an elliptical density such that its marginals 
verify E{Xi) < 00 and E(X^) < 00 for I < i < d, then p is the mean ofX and S is the covariance 
matrix ofX. Consequently, from now on, we will assume that we are in this case. 

Definition 2.2. Let t be an elliptical density on R* and let q be an elliptical density on . The 
elliptical densities t and q are said to belong to the same family - or class - of elliptical densities, 
if their generating densities are and ^k' respectively, which belong to a common given family 
of densities. 

Eixample 2.1. Consider two Gaussian densities N(0, 1) and N((0,0), Id2). They are said to 
belong to the same elliptical families as they both present x 1— > e"-* as generating density. 

2.1.2. Choice of g 

Let us begin with studying the following case: 
Let / be a density on W'. Let us assume there exists d not null independent vectors aj, with 
1 < j <d,of R'', such that 

f(x) — n{a^j_^^x, a^x)h{aj X, a^j x), (2.1) 

with j < d, with n being an elliptical density on R''"-'"' and with h being a density on R^, which 

does not belong to the same family as n. Let X = (Xi , ...,Xd) be a vector presenting / as density. 

Define g as an Elliptical distribution with same mean and variance as /. 

For simplicity, let us assume that the family {aj)i<j<<i is the canonical basis of R'': 

The very definition of / implies that (Xj+i, ..■,Xd) is independent from (Xi, ...,Xj). Hence, the 

density of (Xj+\, .■.,Xd) given (Xi, ...,Xj) is n. 

Let us assume that Of?*-'-', f) - 0, for some /' < d. We then get , , = .f^")? — rrrr, since, 
by induction, we have g'-^Hx) = g(x)^-^...-ij^. 

Sa I Sa2 Sci j 

Consequently, the fact that conditional densities with elliptical distributions are also elliptical as 
well as the above relationship enable us to infer that 

n(flj_^ix, .,fljjc) = f(./ajx, 1 </<;■) = g(./ajx, l<i< j). 
In other words, / coincides with g on the complement of the vector subspace generated by the 
family {a,),-=i,...j. 
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Now, if the family {flj)i<;<(/ is no longer the canonical basis of R'', then this family is again a 
basis of M''. Hence, lemma IrTI - page|24]- implies that 

g(./ajx, fljx) — n(aJ^iX, a^x) — f(./ajx, ajjc), (2.2) 

which is equivalent to having (i>(g^-'\f) - - since by induction g^^^ - g-fj^-i^.-.-^j^. 

The end of our algorithm implies that / coincides with g on the complement of the vector sub- 
space generated by the family {fl,),=i,...,j. Therefore, the nullity of the O -divergence provides us 
with information on the density structure. 

In summary, the following proposition clarifies our choice of g which depends on the family of 
distribution one wants to find in / : 

Proposition 2.1. With the above notations, (^>(g^'\f) = is equivalent to 

gi./ajx, fljx) = f{./ajx, ajx) 

More generally, the above proposition leads us to defining the co-support of / as the vector 
space generated from vectors a\, ...,aj. 

Definition 2.3. Let f be a density on W' . We define the co-vectors of f as the sequence of vectors 
a\, ...,aj which solves the problem ^(g^'\f) — where g is an Elliptical distribution with same 
mean and variance as f. We define the co-support of f as the vector space generated from 
vectors O], ...,fly. 

"Remark 2.2. Any (ai) family defining f as in ( 12.71 ). is an orthogonal basis ofW' - see lemma \E2\ 

2.2. Stochastic outline of our algorithm 

Let Xi, X2,--,X„, (resp. Yi, Y2,..,Ym) be a sequence of m independent random vectors with 
same density / (resp. g). As customary in nonparametric O-divergence optimizations, all esti- 
mates of / and fa as well as all uses of Monte Carlo's methods are being performed using sub- 
samples Xi, X2,..,X„ and Yi, Y2,..,Y„ - extracted respectively from Xi, X2,..,Xi„ and Yi, Y2,..,Y,„ - 
since the estimates are bounded below by some positive deterministic sequence 9„ - see Annex 

m 

Let P„ be the empirical measure of the subsample X\, X2,.,X„. Let /„ (resp. /a_„ for any a in R^) 
be the kernel estimate of / (resp. fa), which is built from Xi, X2,--,X„ (resp. a^Xi, a^X2,..,a~'^X„). 
As defined in section [T31 we introduce the following sequences {ak)k>\ and {g^^^)k>\'- 

fa 

• fljt is a non null vector of R'' such that = arg min il)(g**"''— /), (2.3) 

• is the density such that = 5^*""-^ with = g. 

(k 1) 
Sat 

The stochastic setting up of the algorithm uses /„ and g',*'* = g instead of / and g*"' - g - since 
g is known. Thus, at the first step, we build the vector ai which minimizes the O-divergence 
between f„ and g^ and which estimates ai : 

Proposition IB. 1 1 page l20l and lemma |R6] page |25] enable us to minimize the O-divergence be- 
tween /„ and gY^. Defining a 1 as the argument of this minimization, proposition 13.31 page [8] 
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shows us that this vector tends to aj . 

Finally, we define the density g;]^ as ^J,]' - which estimates g*'^ through theorem lTTI 

Now, from the second step and as defined in section [T73l the density is unknown. Conse- 
quently, once again, we have to truncate the samples: 

All estimates of / and /„ (resp. and g'^^) are being performed using a subsample Xi, X2,..,X„ 
(resp. exti-acted fromXj, X2,..,X^ (resp. F*", - which is a sequence of 

m independent random vectors with same density g^'') such that the estimates are bounded below 
by some positive deterministic sequence 0„ - see Annex IbI 

Let P„ be the empirical measure of the subsample Xi, X2,..,X„. Let /„ (resp. g^^\ fa,„, g^a}i for 
any a in R^) be the kernel estimate of / (resp. g^^^ and /„ as well as g^a^) which is built from 
Xu X2,..,X„ (resp. Yf \ and a'^Xu a'X2,..,a^Xn as well as 

The stochastic setting up of the algorithm uses /„ and g^^^ instead of / and g^'*. Thus, we build 
the vector a2 which minimizes the O -divergence between /„ and gi''4iT " since g*'' and gj,'' are 
unknown - and which estimates 02- Proposition IB . 1 1 page l20l and lemma IFTSI page |25] enable us 
to minimize the O-divergence between /„ and gj/'4if- Defining 02 as the argument of this mini- 
mization, proposition l3.3l page[8]shows us that this vector tends to 02 in «. Finally, we define the 
density g'^'' as g'^^ - g^n ^^^ which estimates g*^* through theorem lTTI 

And so on, we will end up obtaining a sequence {a\,a2, ■■■) of vectors in estimating the co- 
vectors of / and a sequence of densities {gn^)k such that g"!^'' estimates g^'^^ through theorem ITTI 



3. Results 

3.1. Convergence results 
3.1.1. Hypotheses on f 

In this paragraph, we define the set of hypotheses on / which could possibly be of use in our 
work. Discussion on several of these hypotheses can be found in Annex|E] 
In this section, to be more legible we replace g with g**"''. Let 

= ]R'', ©* = {/7€0| ^^*{^'(«j^^^))dV<^], 

V„M(_b, fl) = / M{b, a, x)df„, PMib, «) = / M(b, a, x)dP, 
where P is the probability measure presenting / as density. 



Similarly as in chapter V of IVan der Vaart ( 1998 ). let us define : 



(HI) : For all e > 0, there is 77 > 0, such that for all c g 0* verifying ||c - > e, 

we have PM(c, a) - tj > PM(ak, a), with a G 0. 
(H2) : 3 Z < 0, «() > such that (n > «o ^ sup^^g sup^gj0«)r P„M(c, a) < Z) 
{H3) : There is a neighbourhood V of Uk, and a positive function H, such that, 

for all c G y, we have |7V/(c, o^, x)| < //(x) (P — g.s.) with PH < 00, 
{H4) : There is a neighbourhood V of a^, such that for all e, there is a 77 such that for 

all c 6 y and a G 0, verifying ||fl - UkW > s, we have PM(c, au) < PM(c, a) - rj. 
Putting /„, = ^^(gh.J), and x pib,a,x) = let us now consider 

three new hypotheses: 

{H5) : The function ip is in (0, -1-00) and there is a neighbourhood Vj^ of (0^, a^) such that, for 
all (b, a) of V' the gradient ^(^^^0^) and the Hessian Ki^^^j^^) exist {A_a.s.), and 
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the first order partial derivatives "^'•"^''{"'f' .^^ and the first and second order derivatives of 
(b, a) i-> p(b, a, x) are dominated (^_a.s.) by /l-integrable functions. 
{H6) : The function (b, a) i-> M{b, a) is in a neighbourhood Vk of (a^, at) for all jc; and the 
partial derivatives of (b, a) i-> M{b, a) are all dominated in by a P_integrable function 
H(x). 

(HI) : P\\-^M(ak, and P\\-^M(ak, ak)\\^ are finite and the expressions P^£^M{ak, ai) and 

/aj exist and are invertible. 
Finally, we define 

(i/8) : There exists k such that PM{ak, a*) - 0. 
(i/9) : {Varp{M{ak,ak))y^^ exists and is invertible. 
{HQ): f and § are assumed to be positive and bounded. 

3.1.2. Estimation of the first co-vector of f 

Let H be the class of all positive functions r defined on R and such that g{x)r(a^ x) is a density 
on R"' for all a belonging to R^. The following proposition shows that there exists a vector a such 
that ^ minimizes ^{gr, f) in r: 

Proposition 3.1. There exists a vector a belonging to Wl such that 

argmin<i>{gr, f) - — andr(a^x) = — ^. 

ga gaia^x) 

"Remark 3.1. This proposition proves that a\ simultaneously optimises f li.il ), ( li.2D fl«^i ( li.iD . 
in other words, it proves that the underlying structures of f evidenced through our method are 
identical to the ones obtained through Huber's methods - see also Annex\D\ 



I 1 J L f 

Following lBroniatowskil (120091) . let us introduce the estimate of (t>(g—,fn), through 



a,n r \ _ f 
?a J 



^ig—Jn)^ M{a,a,x)dP„{x) 



fa,n 

Jn) — I myu, a. 



Proposition 3.2. Let a be such that a :- arg inf^^^^ <l)(gy^, /„). 
Then, a is a strongly convergent estimate of a, as defined in proposition \3.1 



Let us also introduce the following sequences {ak)k>\ and {g'l^^)k>i, for any given n - see section 



• cik is an estimate of as defined in proposition l3.2l with ^J* instead of g, 

• gn IS sucn tnat g„ - g, g„ (x) - g„ Wgcujj.^^j^T^j, i-e. gn (x) - g{x)iLj^^ .„(a J 
We also note that * is a density. 

3.1.3. Convergence study at the k'^' step of the algorithm: 

In this paragraph, we will show that the sequence (a^),, converges towards a^ and that the 
sequence iglP)„ converges towards g**'. 

Let Cn(a) - arg sup^^g P„M(c, a), with a e 0, and f„ - arg infog© sup^^g P„M{c, a). We state 
Proposition 3.3. Both sup^^g l|c„(a) — atll and y„ converge toward a^ a.s. 
Finally, the following theorem shows that g^^ converges almost everywhere towards g'-^^: 
Theorem 3.1. It holds gf^ g^'''^ a.s. 
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3.2. Asymptotic Inference at the k step of the algorithm 

The following theorem shows that converges towards g^*' at the rate Op{n^^) in three 
differents cases, namely for any given x, with the L' distance and with the relative entropy: 

Theorem 3.2. It holds \£\x) - g^''\x)\ = Op{n-^), J \gf\x) - g^''>(x)\dx = Op{n-^) and 
\K(gf\f)-K(/'\f)\^Oj.(n-M. 

"Remark 3.2. With the relative entropy, we have n — 0(m^^^) - see lemma \F.13\ The above rates 
consequently become Op{m^^). 

The following theorem shows that the laws of our estimators of a^, namely c„(ak) and f„, 
converge towards a linear combination of Gaussian variables. 

Theorem 3.3. It holds 

^/HJl.Cc„(at) - at) ^-T S.yVd(0,P|||M(fl,,fli)||2) +0.^^(0, P||£M(fl,,fl,)||2) and 

V^yi.(y„ - fl,) C.NAO,niM(ak,ak)f) +C.NAO,niM{''k,ak)f) 

where = P^M(ak, ak)(Pg^M(ak, at) + V^^M{ak, ak)), C = Pgf^M(flt, at) and 

^ = ^Ab^iak, ak) + Pg^M(at, at) + P^M(a,, a,). 

3.3. A stopping rule for the procedure 

In this paragraph, we will call gj^^ (resp. ga}i) the kernel estimator of (resp. ga^). We will 
first show that g!*' converges towards / in A: and n. Then, we will provide a stopping rule for this 
identification procedure. 

3.3.1. Estimation of f 

The following proposition provides us with an estimate of /: 

Theorem 3.4. We have lim„ limt ^Jf' = / a.s. 

Consequently, the following corollary shows that <5(gi* ''^fer^/ai.n) converges towards zero 
as k and then as n go to infinity: 

Corollary 3.1. We have lim„ Hm^ ^{gVj^^,fn) = a.s. 

3.3.2. Testing of the criteria 

In this paragraph, through a test of our criteria, namely a i-> ||^|— ,/«), we will build a 
stopping rule for this procedure. 

First, the next theorem enables us to derive the law of our criteria: 
Theorem 3.5. For a fixed k, we have 

yl^iVarpmcnijn), r«)))"'^'(P«M(c„(7„), %) - ¥„M{at, a^)) 7V(0, /), 
where k represents the k'^ step of our algorithm and where I is the identity matrix in W^. 

Note that k is fixed in theorem 13.51 since y„ - arg infogQ sup^.g0 P„M(c, a) where M is a 
known function of A: - see section [J. 1.1 1 Thus, in the case when 0(g*'^"'^-ipt7,/) - 0, we obtain 

Sat 

CoroUary 3.2. We have ^(VarpiMiUjn), r„)))"'^^P«M(c„(f„), %) N{Q, I). 
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Hence, we propose the test of the null hypothesis 



(Ho) : 0(g(*-'>^,/) = versus the alternative (Hi) : (D(/-i)^,/) ^ 0. 

Sat Sat 

Based on this result, we stop the algorithm, then, defining a/t as the last vector generated, we 
derive from corollarv l3.2l a a-level confidence ellipsoid around a^, namely 

&k^{be W'; ^fll(VarY.(M(b,b)))-^'^F„M(b,b) < q^'^^'^''} 

where q^^^'^^ is the quantile of a c-level reduced centered normal distribution and where P„ is 
the empirical measure araising from a realization of the sequences (Xi, . . .,X„) and (Yi, . . ., Y,,). 
Consequently, the following corollary provides us with a confidence region for the above test; 

Corollary 3.3. &k is a confidence region for the test of the null hypothesis (Hq) versus (Hi). 

3.4. Goodness-of-fit test for copulas 

Let us begin with studying the following case: 
Let / be a density defined on and let g be an Elliptical distribution with same mean and 
variance as /. Assuming first that our algorithm leads us to having (^(g'^\f) = where family 
(fl,) is the canonical basis ofR^. Hence, we have g*^'(x) -g(x)—^ = - through lemma 

^1 Si 

IF.7| page|26]- and g^^^ - f. Therefore, / = g(x)j^^, i.e. - and then 

dxdy ^ dxdy * 
where C/ (resp. Cg) is the copula of / (resp. g). 

At present, let / be a density on M'' and let g be the density defined in section l2". L2I 
Let us assume that our algorithm implies that 'S>(g^''\ f) = 0. 

Hence, we have, for any x e R'', g(x)n'f , , '."'t ^ = f(x), i.e. ^'""l ^ , = TuT^t^t^.^ since 
lemma iRTj page l26l implies that gi* - gat if k < d. 

Moreover, the family (fl,)(=i...(/ is a basis of R"' - see lemma |R8] page |26] Hence, putting A = 
(ai, Gd) and defining vector y (resp. density /, copula Cf of /, density g, copula Cg of g) as 
the expression of vector x (resp. density /, copula Cf of f, density g, copula Cg of g) in basis A, 
the above equality implies 

■Cf - —C^. 



dyi...dyd dy\-dyd 

Finally, we perform a statistical test of the null hypothesis (//q) : sTTWI^f ~ dvfayj 'versus 
the alternative (Hi) : Cf + jf^-j-^i- Since, under (H^), we have <b(g^'^\f) - 0, then, as 



explained in section [3321 corollarv l3 3l provides us with a confidence region for our test. 



Theorem 3.6. Keeping the notations of corollarv \3.3\ we infer that &d is a confidence region for 
the test of the null hypothesis (Hq) versus the alternative hypothesis (Hi). 
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3.5. Rewriting of the convolution product 

In the present paper, we first elaborated an algorithm aiming at isolating several known struc- 
tures from initial datas. Our objective was to verify if for a known density on a known density 
n on M"'"'"' such that, for d > I, 

f(x) — nia^^^x, ...,ajx)h(ajx, ...,a^x), (3.1) 

did indeed exist, with j < d, with (oi, . . . , a^) being a basis of R'' and with h being a density on 
W. 

Secondly, our next step consisted in building an estimate (resp. a representation) of / without 
necessarily assuming that / meets relationship ( 13. Il l - see theorem[3]4] 

Consequently, let us consider Zi and Z2, two random vectors with respective densities hi and h2 
- which is Elliptical - on R''. Let us consider a random vector X such that X — Z\ +Z2 and let / 
be its density. This density can then be written as : 



f{x) - h\* h-iix) - I hi(x)h2(t - x)dt. 

Jw 

Then, the following property enables us to represent / under the form of a product and without 
the integral sign 

Proposition 3.4. Let (pbe a centered Elliptical density with cr^.Id, cr^ > 0, as covariance matrix, 
such that it is a product density in all orthogona l coordinate system s and such that its character- 
istic function s ^PCllipcr^) is integrable - see xLandsman \200A) . 
Let f be a density on which can be deconvoluted with <p, i.e. 



f^f*<P^ I f(x)(p(t - x)dt, 
where f is some density on W^. 

Let g*-"' be the Elliptical density belonging to the same Elliptical family as f and having same 
mean and variance as f. 

Then, the sequence {g^''^)i{ converges uniformly a.s. and in towards f in k, i.e. 



lim sup It^ix) - fix)\ = 0, and lim It'^ix) - f{x)\dx = 



Finally, with the notations of section [331 and of proposition 13. 41 the following theorem enables 
us to estimate any convolution product of a multivariate Elliptical density (p with a continuous 
density /: 

Theorem 3.7. It holds lim„ lim^ ^J*' - f * </> a.s. 
3.6. On the regression 

In this section, we will study several applications of our algorithm pertaining to the regression 
analysis. We define {Xi , ..■,Xd) (resp. {Y\, Yd)) as a vector with density / (resp. g - see section 

'Remark 3.3. In this paragraph, we will work in the L^ space. Then, we wi ll first only co nsider 
the (^-divergences which are greater than or equal to the L^ distance - see IVaida nI973]) . Note 
also that the co-vectors of f can be obtained in the L^ space - see lemma \E6\ and proposition 

\bJ\ 
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3.6.1. The basic idea 

In this paragraph, we will assume that = and that our algorithm stops for 7=1 and 
fli = (0, 1)'. The following theorem provides us with the regression of X\ on X2 : 

Theorem 3.8. The probability measure ofXi given X2 is the same as the probability measure of 
Yi given Y2. Moreover, the regression between X\ and X2 is 

Xi ^E(YilY2) + s, 

where e is a centered random variable orthogonal to E{X\ /X2). 



'Remark 3.4. This theorem implies that E(Xi/X2) — E(Yi/Y2). This equation can be used in 
many fields of research. The Markov chain theory has been used for instance in example \1.2\ 
Moreover, if g is a Gaussian density with same mean and variance as f, then \Saporta ( 20o3l) 
implies that E(Yi/Y2) = E(Yi) + ^^§(^(Y2 - E{Y2)) and then 

Xi = E(Yi) + £^p^l^(Y2 - E(Y2)) + e. 
Var{Y2) 

3.6.2. General case 

In this paragraph, we will assume that = M.'l and that our algorithm stops with j for j < d. 
Lemma |R9] implies the existence of an orthogonal and free family {bi)i=j+i^„4 of Wj^ such that 

, -L 

W - Vect{ai] e Vect{bk] and such that 



g(bJ_^iX, bjjx/ajx, aJx) - f(bJ^iX, bjjx/ajx, aJx). (3.2) 

Hence, the following theorem provides us with the regression of ZjJX, k = 1, c/, on (aJX, aJX): 

Theorem 3.9. The probability measure of (b^^^X, ...,b^X) given (aJX, ...,aJX) is the same as 
the probability measure of(b~^^^Y, b^Y) given (ajY, ajY). Moreover, the regression ofbJX, 
k — I, ...,d, on (aJX, aJX) is bJX — E(bjY/ajY\, ajY)+bJe, where e is a centered random 
vector such that bjs is orthogonal to E(bJX/aJX, aTX). 

Corollary 3.4. If g is a Gaussian density with same mean and variance as f, and ifCov(Xi, Xj) — 
Q for any i + j, then, the regression ofbJX, k — I, d, on {aJX, aJX) is bJX — E{bjY)+bJe, 
where s is a centered random vector such that bje is orthogonal to E(bJX/aJX, aJX). 



4. Simulations 

Let us study four examples. The first involves a -divergence, the second a Hellinger dis- 
tance, the third a Cressie-Read divergence (still with y = 1 .25) and the fourth a Kullback Leibler 
divergence. 

In each example, our program will follow our algorithm and will aim at creating a sequence of 
densities (g^j\ j = l,..,k, k < d, such that = g, g^j> = g''^'^^faj/[g^^'^% and a)(gW,/) = 0, 
with O being a divergence and fl^- = arginf^ <l)(g*-'"'^/i/[g*-'"'^]/,,/), for all j = l,...,k. Moreover, 
in the second example, we will study the robustness of our method with two ouliers. In the third 
example, defining (Xi , X2) as a vector with / as density, we will study the regression of Xi on 
X2. And finally, in the fourth example, we wiU perform our goodness-of-fit test for copulas. 
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Simulation 4.1 (With the;\f^ divergence). 

We are in dimension 3( —d), and we consider a sample of 50( —n) values of a random variable X 
with a density law f defined by : 

fix) — Gaussian(xl + x2).Gaussian(xO + x2).Gumbel{xO + xl), 
where the Normal law parameters are (—5,2) and (1,1) and where the Gumbel distribution 
parameters are —3 and 4. Let us generate then a Gaussian random variable Y - that we will 
name g - with a density presenting the same mean and variance as f. 

We theoretically obtain k - \ and a\ — (1, 1,0). To get this result, we perform the following test: 

HO: ai ^ (1, 1,0) versus (Hi): 0,9^(1,1,0). 
Then, corollarv \3.3\ enables us to estimate ai by the following 0.9( —a) level confidence ellipsoid 

6i^{be (Varp(M(b,b)))'^-^'^>P„M(b,b) < ^^^°''V V" 

^ 0,2533/7.0710678 = 0.03582203). 

And, we obtain 



Our Algorithm 




minimum : 0.0201741 


Projection Study : 


at point : (1.00912,1.09453,0.01893) 




P-Value: 0.81131 


Test : 


Hq : fli e &i : True 


X^(Kemel Estimation of g^^\ g*-^^) 


6.1726 



Therefore, we conclude that f = g' 



Simulation 4.2 (With the Bellinger distance H). 

We are in dimension 20( —d). We first generate a sample with 100( —n) observations, namely two 
outliers x — (2, 0, . . . , 0) and 98 values of a random variable X with a density law f defined 
by f(x) — Gumbel(xo).Normal(xi, . . . , xg), where the Gumbel law parameters are -5 and 1 and 
where the normal distribution is reduced and centered. 
Our reasoning is the same as in Simulation \4.1\ 

In the first part of the program, we theoretically obtain k - \ and oi = (1, 0, . . . , 0). To get this 
result, we perform the following test (Hq) : a\ = (1, 0, . . . , 0) versus (H\) : ai (\,0, . . . ,0). 
We estimate a\ by the following 0.9( -a) level confidence ellipsoid 

Si ^{be ]R2; (Varj.(M(b,b)))-^/^F„M(b,b) < q^^'^'^^/ 4n ^ 0.02533). 



And, we obtain 


Our Algorithm 




minimum : 0.002692 


Projection 
Study 


at point : (1.01326, 0.0657, 0.0628, 0.1011, 0.0509, 0.1083, 
0.1261, 0.0573, 0.0377, 0.0794, 0.0906, 0.0356, 0.0012, 
0.0292, 0.0737, 0.0934, 0.0286, 0.1057, 0.0697, 0.0771) 




P-Value: 0.80554 


Test : 


Ho : fl] e £i .• True 


H(Estimate of g^^K g^^^) 


3.042174 



Therefore, we conclude that f = g' 



Simulation 4.3 (With the Cressie-Read divergence (O)). 

We are in dimension 2(—d), and we consider a sample of 50(—n) values of a random variable 
X - (Xi,X2) with a density law f defined by f(x) - Gumbel(xo).Normal(xi), where the Gumbel 
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law parameters are -5 and 1 and where the normal distribution parameters are (0, 1). Let us 
generate then a Gaussian random variable Y - that we will name g - with a density presenting 
same mean and variance as f. 

We theoretically obtain k — I and a\ — (1,0). To get this result, we perform the following test: 

HO : ai ^(1,0) versus (Hi) : aj 7^(1,0). 
Then, corollar\ \3.3\ enables us to estimate a\ by the following 0.9( —a) level confidence ellipsoid 

6i^{be R2; (Varp(M(b,b))y-^^^^P„M(b,b) < q^^^'^^ I V" ^ 0.03582203). 



And, we obtain 



Our Algorithm 




minimum : 0.0210058 


Projection Study : 


at point: (1.001,0.0014) 




P-Value : 0.989552 


Test : 


Ho : fli e £i .• True 


(^(Kernel Estimation of g^^\ 


6.47617 



Therefore, we conclude that f — g' 



'Loi_lnconnue_Avec_x.dat' 
'Approximation-Finale.dat' 




Figure 1: Graph of the distribution to estimate (red) and of our own estimate (green). 
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'LoLlnconnue_Avec_x.clat' 
'Approximation-Finale-Huber.dat' 




Figure 2: Graph of the distribution to estimate (red) and ofHuber's estimate (green). 



At present, keeping the notations of this simulation, let us study the regression of X\ on Xj. 
Our algorithm leads us to infer that the density of Xi given X2 is the same as the density of yi 
given Yi- Moreover, property |AT| implies that the co-factors of / are the same with all diver- 
gence. Consequently, we can use theorem [3T8l i.e. it implies that Xi - E(Yi/Y2) + s, where s is 
a centered random variable orthogonal to E(Xi/X2)- Thus, since g is a Gaussian density, remark 
I3.4l implies that 

CoviYuY2), 



Xi = E{Yi) + 



-{Y2 - EiY2)) + e. 



Var(Y2) 

Now, using the least squares method, we estimate ai and 02 such that Xi = ai + 02X2 + s. 
Thus, the following table presents the results of our regression and of the least squares method if 
we assume that s is Gaussian. 



Our Regression 



E(Yi) 



-4.545483 



Cov(YuY2) 


0.0380534 


Var{Y2) 


0.9190052 


E(Y2) 


0.3103752 


correlation coefficient (Fi, Y2) 


0.02158213 


ai 


-4.34159227 


fl2 


0.06803317 


correlation coefficient (Xi , X2) 


0.04888484 



Least squares method 
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Figure 3: Graph of the regression of XI on X2 based on the least squares method (red) and based on our theory (green). 

Simulation 4.4 (With the relative entropy K). 

We are in dimension 2( —d), and we use the relative entropy to perform our optimisations. Let us 
consider a sample of 50( —n) values of a random variable X with a density law f defined by : 

f(x) = Cp(Fcumbei(xo), F Exponentiai{xi)).Gumbel(xQ).Exponential{x{), 
where : 

• c is the Gaussian copula with correlation coefficient p — 0.5, 

• the Gumbel distribution parameters are — I and 1 and 

• the Exponential density parameter is 2. 

Let us generate then a Gaussian random variable Y - that we will name g - with a density 
presenting the same mean and variance as f. 

We theoretically obtain k — 2 and (01,02) = ((1,0), (0, 1)). To get this result, we perform the 
following test: 

(Hq): (aua2) ^ ((1,0), (0,1)) versus (Hi) : (oi, 02) ^ ((1, 0), (0, 1)). 
Then, theorem \3.6\ enables us to verify (Hq) by the following 0.9(—a) level confidence ellipsoid 
62^ {be R2; (Varp(M(b,b))y-^'^>V„M(b,b) < qT°'^^ / V" ^ 0,2533/7.0710678 = 0.0358220). 



And, we obtain 



Our Algorithm 




minimum : 0.445199 


Projection Study number : 


at point: (LO 142, 0.0026) 




P-Value: 0.94579 


Test : 


//()." fli e £1 .• False 
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minimum : 0.0263 


Projection Study number 1 : 


at point: (0.0084,0.9006) 




P-Value: 0.97101 


Test : 


Hq : fl2 6 £2 -■ True 


K( Kernel Estimation of g'-^K g'-^^) 


4.0680 



Therefore, we can conclude that Hq is verified. 




O-OO.O 



Figure 4: Graph of the estimate o/(xo,xi) m Cp(Fa,m,bel(xo), FExpone,uial(xi)). 



Critics of the simulations 

In the case where / is unknown, we will never be sure to have reached the minimum of the 
O-divergence: we have indeed used the simulated annealing method to solve our optimisation 
problem, and therefore it is only when the number of random jumps tends in theory towards 
infinity that the probability to reach the minimum tends to 1 . We also note that no theory on the 
optimal number of jumps to implement does exist, as this number depends on the specificities of 
each particular problem. 

Moreover, we choose the SO^^n (resp. IOO^th) for the AMISE of simulations 14. 1 1 l4~2l and 1431 
(resp. simu lation 14.41 1. This choice leads us to simulate 50 (resp. 100) random variables - see 



ScottI (Il992h page 151 -, none of which have been discarded to obtain the truncated sample. 



Finally, we remark that some of the key advantages of our method over Huber's consist in the 
fact that - since there exist divergences smaller than the relative entropy - our method requires 
a considerably shorter computation time and also in the in the superiority in robustness of our 
method. 
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Conclusion 

Projection Pursuit is useful in evidencing characteristic struct ures as well as one-dimensional 
projections and their associated distributions in multivariate data. iHuber ( 1985 ) shows us how to 
achieve it through maximization of the relative entropy. 

The present article shows that our ^-divergence method constitutes a good alternative to Huber's 
particularly in terms of regression and robustness as well as in terms of copula's study. Indeed, 
the convergence results and simulations we carried out, convincingly fulfilled our expectations 
regarding our methodology. 



A. Reminders 

A.l. ^-Divergence 

Let us call the density of a^Z if h is the density of Z. Let ^ be a strictly convex function 
defined by : 1+ ^ 1+, and such that </)(l) = 0. 

Definition A.l. We define the (^-divergence of P from Q, where P and Q are two probability 
distributions over a space O such that Q is absolutely continuous with respect to P, by 



^(Q,P)^ I Vi'^)dP. (A.l) 



The above expression ( IA.7I ) is also valid if P and Q are both dominated by the same probability. 

The most used distances (Kullback, Hellinger or;^'^) belong t o the Cres sie-Read family 

("see | Cres sie-Read ( 1984). Csiszar 1. ( 1967) and the books of Fried rich and lgor (1987) . Pardo Leandrol 
(l2Q06h and lZografos K.I (1199(D ). They are defined by a specific ip. Indeed, 

- with the relative entropy, we associate ip{x) - xln(x) - x + 1 

- with the Hellinger distance, we associate ^(x) - 2( y/x - 1)^ 

- with the^^ distance, we associate ifi(x) - j{x- 1)^ 

- more generally, with power divergences, we associate ip{x) - yjyli^ \ where y e M \ (0, 1) 

- and, finally, with the L' norm, which is also a divergence, we associate (f(x) - \x- 1|. 
In particular we have the following inequalities: 

dL^(g,f)<K(g,f)<xHg,f). 
Let us now present some well-known properties of divergences. 

Property A.l. We have (D(P, g) = o f = g. 

Property A.2. The application Q i— » <I>(2, P) is greater than the L} distance, convex, lower 
semi-continuous (l.s.c.) - for the topology that makes all the applications of the form Q i-^ J fdQ 
continuous where f is bounded and continuous - as well as l.s.c. for the topology of the uniform 
convergence. 



Property A.3 (corollary (1.29), page 19 of iFriedrich and Igori (119871) ). IfT : (X,A) (Y,B) is 



measurable and if K{P, Q) < oo, then K(P, Q) > K(PT QT '), with equality being reached 
when T is surjective for (P, Q). 



Tlieorem A.l (theorem 111.4 of lAzg (119971) ). Let f : I R be a convex function. Then f is a 



Lipschitz function in all compact intervals [a, b] C int{I]. In particular, f is continuous on int[I]. 
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A.2. Useful lemmas 

Through a reductio ad absurdum argument, we derive lemmas PV.ll and lA!2l : 

lemme A.l. Let f be a density in W' bounded and positive. Then, any projection density of f - 
that we will name fa, with a e - is also bounded and positive in R. 

lemme A.2. Let f be a density in bounded and positive. Then any density f(./a^x), for any 
a € Ri, is also bounded and positive. 

By induction and from lemmas IaTI and IaTSI we have 

lemme A.3. Iff and g are positive and bounded densities, then g^'^^ is positive and bounded. 
Finally we introduce a last lemma 

lemme A.4. Let f be an absolutely continuous density, then, for all sequences (a„) tending to a 
in M.'l, sequence fa,, uniformly converges towards fa. 

Proof. For all a in Rf , let Fa be the cumulative distribution function of a^X and t//a be a complex 
function defined by ij/a{u, v) - FaCReiu + iv)) + iFaCReiv + iu)), for all u and v in R. 
First, the function i^„(m, v) is an analytic function, because x i-> /^(fl^x) is continuous and as 
a result of the corollary of Dini's second theorem - according to which "A sequence of cumu- 
lative distribution functions which pointwise converges on K towards a continuous cumulative 
distribution function F on M, uniformly converges towards F on W- we deduct that, for all se- 
quences (a„) converging towards a, i//a„ uniformly converges towards iffa- Finally, the Weierstrass 
theorem, (see proposal (10.1) page 220 of the "Calcul infinitesimal" book of Jean Dieudonne), 
implies that all sequences i/r^, „ uniformly converge towards i/r^,, for all a„ tending to a. We can 
therefore conclude. □ 



B. Study of the sample 

Let Xi, X2,..,X„, be a sequence of independent random vectors with same density /. Let Yi, 
Y2, ..,¥„, be a sequence of independent random vectors with same density g. Then, the kernel 
estimators /„,, g„t, fa,m and ,„ of /, g, fa and ga, for all a e , almost surely and uniformly 
converge since we ass ume that the bandwidth h,„ of these estimators meets the following condi- 
tions (see lBosql (Il999l) ): 



HHyp): hm \m 0, mhm /m mh„,IL(h:^) ->,„ 00 and L(h:^)ILLm -^,„ 00, 
with L{u) - ln{u V e). 
Let us consider 

Bi(n,fl) - iZ" andB2(«,fl) = ,^*{^'{^^^%#)). 

Our goal is to estimate the minimum of ^(gj-'f)- To do this, it is necessary for us to truncate 
our samples: 

Let us consider now a positive sequence 6„, such that 6„, — > 0, ym/df, — > 0, where y„ is the 
almost sure convergence rate of the kernel density estimator - ym - (9p(m ra), see lemma lRlOl - 
yln/Sm ~* 0, where yj,!' is defined by 

gmjx) fh,m(b'^ X) g{x) fb(b'^x) ^ 

^ /m(^) gbAb'^x)' '^^f{x) ghib-'x)'^ - 
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for all b in and all x in R"', and finally ^ — > 0, where y„ is defined by 

I ,, gmix) fb,m{t>^x) ^ ,. gW fh(b^x) ^ (2) 

for all b in R^ and all x in R''. 

We will generate f,,,, g„, and gi, „, from the starting sample and we will select the X, and F,- vectors 

such that fniiXi) > 0,„ and gb,,„(b^Yi) > 6„, for all i and for all e Rf . 

The vectors meeting these conditions will be called Xi,X2, .■.,X„ and Yi, Y2, Y„. 

Consequently, the next proposition provides us with the condition required for us to derive our 

estimations 

Proposition B.l. Using the notations introduced in \Broniatowski and in section \3.1.1\ it 

holds lim„^oo sup^gR./ |(Bi(n, a) - B2(n, a)) - ^(g^^,f)\ = 0. 

"Remark B.l. With the relative entropy, we can take for 8,„ the expression m^^, with < v < j^. 



C. Case study : / is known 

In this Annex, we will study the case when / and g are known. We will then use the notations 
introduced in sections . 1 . 1 1 and l3 . 1 .21 with / and g, i.e. no longer with their kernel estimates. 

C.l. Convergence study and Asymptotic Inference at the k''^ step of the algorithm 

In this paragraph, when k is less than or equal to d, we will show that the sequence {ak)„ 

converges towards a^ and that the sequence conve rges towards g '''\ 

Both y„ and c„(a) are M-estimators and estimate a^ - see iBroniatowskj (f2009h . We state 

Proposition C.l. Assuming (HI) to (H3) hold. Both sup^^g ||c„(a) - a^W andjn tends to at a.s. 

Finally, the following theorem shows us that converges uniformly almost everywhere towards 
g**', for any k - l..d. 

Theorem C.l. Assumimg (HI) to (H3) hold. Then, g^''^ a.s. and uniformly a.e. 

The following theorem shows that g**' converges at the rate (9p(n"'''^) in three diflFerents 
cases, namely for any given x, with the L' distance and with the O-divergence: 

Theorem C.2. Assuming (HQ) to {H3) hold, for any k = I, d and any x e R'', we have 

||«(x)-/>(x)| =0p(«-'/\ (C.l) 

J \g'-'\x) - g^'KxWx = Op(n-"\ (C.2) 

im^'K f) - K(g'-'\f)\ = Oj.(n-''^). (C.3) 

The following theorem shows that the laws of our estimators of ai^, namely c„(ak) and f„, 
converge towards a linear combination of Gaussian variables. 

Theorem C.3. Assuming that conditions (HI) to (H6) hold, then 

V^y[.(c„(flA,) - fl,) S.^rf(0,P|||M(fl,,fli)||2) +C.yVX0,P|||;M(flt,flt)||2) and 

V^^.(y„ - fl,) C.MAO,niM(ak,ak)f) +C.Nd(0,ni;M{ak,at)f) 

where = (P^M(flt, ak)iPg^^M(ak, ak) + V-^M{ak, at))), 

C = P^M(fli, at) and S = P-^M(ak, fl*) + Fg^^M(ak, a^) + V-^M{ak, at). 
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C.2. A stopping rule for the procedure 

We now assume that the algorithm does not stop after d iterations. We then remark that, it 
still holds - for any / > d: 

. g«(x) = 8Wk=x J-'l\^Jl.,y with = g. 

• K{g^°\f) > W\"f) >K(/^\f)... > 0. 

• Theorems ICTI IC2l and |C3] 



Moreover, as explained in section 14 of iHuberl (Il985h for the relative entropy, the sequence 



(0 



(<S>(g^'' ^^-jrT;,f))ii>\ converges towards zero. Then, in this paragraph, we will show that g' 

converges towards / in /. And finally, we will provide a stopping rule for this identification 
procedure. 

C.2.1. Representation of f 

Under (HQ), the following proposition shows us that the probability measure with density 
converges towards the probability measure with density / : 

Proposition C.2 (Representation of /). We have Ywn.kg'^'^^ - f a.s. 

C.2.2. Testing of the criteria 

Through a test of the criteria, namely a "tC^**^ '^-(1^7, /), we will build a stopping rule for 
this procedure. First, the next theorem enables us to derive the law of the criteria. 

Theorem C.4. Assuming that (HI) to (H3), (H6) and (HS) hold. Then, 

V^(yflrp(M(c„(7„), %)))-''HP„M(c„(%), %) - P„M(fli, a^)) ^-^ N(0, 1), 
where k represents the k'^ step of the algorithm and with I being the identity matrix in W'. 



Note that k is fixed in theorem|C4]since y„ = arg 'mfae& sup^^g P„M(c, a) where M is a known 
function ofk- see section U. 1.11 Thus, in the case where 'I>(g*'^"''^^,/) - 0, we obtain 

Corollary C.l. Assuming that (HI) to (H3), (H6), (HI) and (H8) hold. Then, 

y/E(Varp(M(Cn(f„),f„)))-"HF„M(c„(7„),J„)) N(0,I). 
Hence, we propose the test of the null hypothesis 

(Ho) : K(g^''-'^4k,J) = versus (//j) : K(g^'<-x)^^^ ^ q 

gat gat 

Based on this result, we stop the algorithm, then, defining as the last vector generated, we 
derive from corollarv lC.il a ff-level confidence ellipsoid around a^, namely 

&k^{be W; yI^(Varp(M(b,b)))-^l^¥„M(b,b) < ^^''"•'*), 
where q'^'"''* is the quantile of a a-level reduced centered normal distribution. 
Consequently, the following corollary provides us with a confidence region for the above test: 

Corollary C.2. &k is confidence region for the test of the null hypothesis (Hq) versus (Hi). 



D. The first co-vector of / simultaneously optimizes four problems 

Let us first study Huber's analytic approach. 
Let "R' be the class of all positive functions r defined on R and such that /(x)r"' (a^jr) is a density 
on M'' for all a belonging to M^. The following proposition shows that there exists a vector a 
such that — minimizes K(fr~^,g) in r: 
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Proposition D.l (Analytic Approach). There exists a vector a belonging to such that 

argmmreK K{.fr-\g) = ^, and r(a^ x) = as well as K(f,g) = K(fa,g„) + K(f^,g). 

Let us also study Huber's synthetic approach: 

Let "R be the class of all positive functions r defined on R and such that g{x)r(a~'^x) is a density on 
M'' for all a belonging to . The following proposition shows that there exists a vector a such 
that ^ minimizes K(gr, f) in r. 

Proposition D.2 (Synthetic Approach). There exists a vector a belonging to Rf such that 
arg mmre-R K(f, gr) ^ j^, and r(a'^x) = as well as K{f, g) = K(fa, gc) + K(f, g^). 

In the meanwhile, the following proposition shows that there exists a vector a such that ^ mini- 
mizes K(g,fr^^) in r. 

Proposition D.3. There exists a vector a belonging to Rf such that 

argminre-R' K(g,fr'^) = ^, andr(a'^x) = ^|2_£i as well as K(g, f) = K{g„,f„) + K(g,f^). 

'Remark D.l. First, through property \A.3\ pase \18\ we get K{ f,gj-) — K(g,fj-) — K{fj-,g)and 
K(fa,ga) — K(ga,fa)- Thus, proposition \D.3\ implies that findins the arsument of the maximum of 
K{ga,fa) amounts to finding the argument of the maximum K(fa,ga)- Consequently, the criteria 
of Huber's methodologies is a i-^ K(ga,fa)- Second, if the <^>-divergence is the relative entropy, 
then our criteria is a i— > K{gj-,f) and propertv \A.3\ implies K{g,fj-) — K{gj-,f)- 

To recapitulate, the choice of r = enables us to simultaneously solve the following four 

optimisation problems, for a 6 Rf : 

First, find a such that a - arginf^^gd K{fj-,g), 

Second, find a such that fl = arginfg^jg^d K(f,gj-), 
Third, find a such that a = argsup^^gd K{ga,fa), 
Fourth, find a such that a - arginf^^gd K{gj-,f)- 

E. Hypotlieses' discussion 

E.l. Discussion of (H2). 

Let us work with the relative entropy and with g and oi . 

For all b € Ri we have / 'P*iv'( ff2f!o^% ))mdx = / (f||g^ - l)f(x)dx = 0, since, for any 

b in R^, the function x i-> g(x) ^ ..-^ ' is a density. The complement of in Ri is and then the 

supremum looked for in R is -oo. We can therefore conclude. It is interesting to note that we 
obtain the same verification with /, g**^"'* and ak- 

E.l. Dicussion of {H4). 

This hypothesis consists in the following assumptions: 

• We work with the relative entropy, (0) 

• We have f(./ajx) — g{./ajx), i.e. K(gj-^,f) — Q - we could also derive the same proof with f, 
g*^*"'' and at - (1) 

Preliminary (A): Shows that A = {(c,x) e Ri\{ai}xR'; > g^, ^Wg^ >/W) = « 
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through a reductio ad absurdum, i.e. if we assume A 0. 
Thus, our hypothesis enables us to derive 

fix) = fi./ajx)fa,(ajx) = g{./ajx)fa,(ajx) > g(./c^x)fc(c'^x) > f 

since 1^ > implies g(./ajx)fa,(ajx) = g(x)^^^ > ^Wg^ = g(./c^x)f,(c^x), 

i.e. / > /. We can therefore conclude. 

Preliminary {B): Shows that B ^ [{c,x) € Ri\{ai}xRd; < g(x)|g </W) = 

through a reductio ad absurdum, i.e. if we assume Z? ^ 0. 
Thus, our hypothesis enables us to derive 

fix) = f{./ajx)fa,{ajx) = g{./ajx)fa,iajx) < gi./c^x)fc(c^x) < f 
We can therefore conclude as above. 
Let us now verify {H4): 

WehavePM(c,fli)-PM(c,fl) = / ln( ){ ||gl§ - }g{x)dx. Moreover, the logarithm 

In is negative on {-x e Rf; < 1) and is positive on {;c G Mf; > 1). 

Thus, the preliminary studies (A) and (B) show that /»( ^^fj^'^ffl ) and {^^^4~^ ~ ^'^.t'^I ) always 
present a negative product. We can therefore conclude, since (c,a) i-> PM(c,a\) - PM{c,a) is 
not null for all c and for all a - with a + a\. 

F. Proofs 

This last section includes the proofs of most of the lemmas, propositions, theorems and corol- 
laries contained in the present article. 

'Remark F.l. 1/ (HO) - according to which f and g are assumed to be positive and bounded - 
through lemma \A3\ ( see pase \19\) implies that g^''^ and g^'^^ are positive and bounded. 
2/ remark \2J] page\5\implies that f„, g„, and g^''^ are positive and bounded since we consider 
a Gaussian kernel. 

Proof of propositions ID. II and ID.2[ Let us first study proposition |D2] 

Without loss of generality, we will prove this proposition with xi in lieu of a^X. 

Let us define g* - gr. We remark that g and g* present the same density conditionally to xi. 

Indeed, g\(xi) - J g'{x)dx2...dxd - J r(xi)g(x)dx2...dxd - r(xi) J g(x)dx2...dxd - r(xi)gi(xi). 

Thus, we can demonstrate this proposition. 

We have = ^^^''/'''f"^ and gi{xi)r{xi) is the marginal density of g*. Hence, 

J g*dx = J gi{xi)r(xi)g{.\xi)dx = J g\{xi)^^^^(J g(.\xi)dx2..dxd)dxi = J f(xi)dxi = 1 and 
since g* is positive, then g* is a density. Moreover, 

K(f,g'-) = J f{ln(f) - ln(g'-)}dx, (F.l) 

= j f{ln(f(.\xi)) - ln{g*(.\xi)) + Inifixi)) - ln(giixi)r{xi))}dx, 

= J f{ln(f(.\xi)) - ln{g{.\x,)) + ln(f(x,)) - ln(gi(xi)r(xi))}dx, (F.2) 

as g*(.\xi) = g(.\xi). Since the minimum of this last equation ( IF. 2b is reached through the min- 
imization of J f{ln{f\{xi)) - ln(g\{x\)r{x\))'\dx - K(f,gir), then property |AT| necessarily im- 
pUes that /i = gir, hence r = fi/gi. 
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Finally, we have K(f, g) - K(f, g*) - J f{ln(fi(xi))-ln(gi(xi))}dx = which completes 

the demonstration of proposition lD.2l 

Similarly, if we replace /* = /r ' with / and g with g*, we obtain the proposition lD.il □ 
Proof of proposition ID.3I The demonstration is very similar to the one for proposition ID. 21 
save for the fact we now base our reasoning at row ( IF. Il l on J g{ln(g*) - ln(f)}dx instead of 
Kif,g*) = ff{ln{f)-ln(g*)}dx. □ 
Proof of proposition l3.1l 

Without loss of generality, we reason with xi in lieu of a^x. 

Let us define g* = gr. We remark that g and g* present the same density conditionally to xi. 
Indeed, gl(xi) - J g*(x)dx2...dxd - J h(xi)g(x)dx2...dxd - h(xi) J g(x)dx2...dxd — h{xi)g\{xi). 
We can therefore prove this proposition. 

First, since / and g are known, then, for any given function /i : xi i-> h{x\), the application T, 
which is defined by: 

T : gi.lx,)'^^^^ ^ g(./xi)Mxi), 

T ■.f(./xi)Mx,)^ fUx,)Mxi) 
is measurable. 

Second, the above remark implies that 

1>(^*,/) = a>(g*(-/xi)ap^,/(./xi)/i(xi)) = <^(g(./xi)S^^^^,f(./xi)Mxi)). 
Consequently, propert v I A . 3 ( pa ge [TS] infers : 

<^(8(-/xir-'^j^J(./xi)Mxi)) > ^(T-\g(.lx,Y-^^j^),T-\f(.lx,)Mxi))) 
= by the very definition of T. 

which completes the proof of this proposition. □ 
Proof of lemma IF. II 

lemme F.l. We have g(./ajx, ajx) — n(a J^jX, ajx) — f(./ajx, ajx). 

Putting A = (fli , .., Od), let us determine / in basis A. Let us first study the function defined by 

can immediately say that t/f is continuous and since A is a 
basis, its bijectivity is obvious. Moreover, let us study its Jacobian. 



By definition, it is J^(xu . . . , Xd) = 











dxi 


dxd 




ai,i ■ 










ad, I ■ 


ad,d 


dxi 


dxd 







- \A\ since A is a 



basis. We can therefore infer : Vx e W', 3\y e W' such that /(x) = lAl-^'i'iy), i.e. (resp. y) 
is the expression of / (resp of x) in basis A, namely '^(y) - h{yj+i, ...,yd)h{yi, with h and 

h being the expressions of n and h in basis A. Consequently, our results in the case where the 
family {flj)i<j<rf is the canonical basis of R'', still hold for in basis A - see section |2.L 21 And 
then, iff is the expression of g in basis A, we have g(./yi, ...,yj) - h(yj+i, ...,yd) - ^{./yi, ...,yj), 
i.e. g(./ajx, g jx) — n{aj^^x, aJx) = f(./ajx, aJx). 
Proof of lemma 



□ 



lemme F.2. Should there exist a family (a,)i=i...£/ such that f(x) — n(aJ^[X, ajx)h(ajx, aJx), 
with j < d, with f, n and h being densities, then this family is a orthogonal basis ofW^. 

Using a reductio ad absurdum, we have J f(x)dx = 1 +oo - J n(a~^_^^x, ...,ajx)h(ajx, ...,ajx)dx. 
We can therefore conclude. □ 
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Proof of proposition lB.il 

Let us note first that we will prove this proposition for k > 2, i.e. in the case where g**"'' is not 
known. The initial case using the known density g*"^ - g, will be an immediate consequence 
from the above. 

Moreover, going forward, to be more legible, we will use g (resp. g„) in lieu of g^''^^^ (resp. 

We can therefore remark that we have /(X,) > 6,, - y„, g(Yi) > 6„-y„ and gb(b~^ Y,) > 6„- y„, for 
all / and for all b e , thanks to the uniform convergence of the kernel estimators. Indeed, we 
have f(Xi) = /(X,) - /„(X,) + /„(X,) > -y,, + /„(X,), by definition of y„, and then /(X,) > -y„ + 0„, 
by hypothesis on f„{Xi). This is also true for gn and gb,n- 

inis entails sup/^^jjd \T,^i=i^ ^ JlJamjm'-JIJfl^) J ^ ^^/w ^ U a.s. 

Indeed, let us remark that 

ilyn ,^n f.Aa^Y,) gn(Y>) . f.Ja^ Y,) r , g(x) Mb^x) . „/ ..n ^ i 

K^i=\^^ ^ga„(a-^Yi) UYi)' gan(a'^Yi)' J ^ ^f(x) gt(b-^x)> 6\-^> gaid^x) "-^1 

- |iy« r 'LM'^Y,) g„(Yi) . /„,„(a^y,) 1 v„ gTO i 

- ^n^i^if \g„Ja-^Yd UY,)' gaAa'^Yd n^i=\f f{Y,)i gJd'Yi) 

"^«^i=l'^ l^.C^^r,) J '■^ ^ f(x) gt(b-^x)> S^^'g^id^x) 

- \n^i=\V ^gaJfl'^Yd fn(Yd' ga.nC'^Yd n^i=\f Ig^Cfl^r,) /(F,) ' «<,(a^ 1-,) ' 

^\n^i=\f ^gAa-'Yd KYd'Ua-'Yd J ^ f(x) g^ib^^ x) > SK^' gAa^^ x) "-^1 

Moreover, since ^ \f' if^^j^^r^) Si^)Y^^\dx < as implied by lemma lA3l and since we 
assumed g such that <^{g,f) < oo and <i>{f,g) < oo and since b € 0*, the law of large numbers 
enables us to state that - / ^'(f|f ^) g(x)^^ dx\ a.s. 

Furthermore 1^2" ^" fa.M^Y,)g„(Y,) , U(''^Yd i , , Ua^jd g^] fd^\ 

< Ly |,-/f /°.»(«^y.) g„(yi) . /a.„(a^yi) ,./r /<.(a^i^.) g(Yi) ^ fa(a''Yl) , 
- n^i=\\V tg„„(flTy.) /„(F,oJg„„(flTy,) V \g^(a-^Yi) f(Yd' ga(a-'Yd^ 

and |^'{M2;|2|l|2)M2;g - ^ as a result of the hypotheses intially 

introduced on e„. Consequently, l^^^J^' {h^^^-^y^^^ - ^' [h^^j^^]MffA^\ ^ o, as 
it is a Cesaro mean. This enables us to conclude. Similarly, we obtain 

sup,,K. J^l^)) - / ^-(^'(iW a.s. □ 

Proof of lemma lR3l By definition of the closure of a set, we have 

lemme F.3. The set Tc is closed in L} for the topology of the uniform convergence. 

Proof of lemma EH Since O is greater than the L' distance, we have 

lemme F.4. For all c> Q, we have Vc c Bi\{f,c), where Bi\{f,c) - [p e 11/ - pWi < c]. 

Proof of lemma iRSl The definition of the closure of a set and lemma lA~4l (see page[T9]l imply 

lemme F.5. G is closed in L} for the topology of the uniform convergence. 

Proof of lemma IF.6I 

lemme F.6. inf^^^j ^{g* ,f) is reached when the ^-divergence is greater than the L' distance as 
well as the distance. 

Proof Indeed, let G be {g^; a G Rf} and T^ be F,. = {p; K{pJ) < c) for all oO. From 
lemmas lF3llF.4l and lF.5l (see pagelZSTl. we get F^ PiG is a compact for the topology of the uniform 
convergence, if F^. n G is not empty. Hence, and since property IA.2I (see page [TST i implies that 
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Q ^ ^{Q, P) is lower semi-continuous in L' for the topology of the uniform convergence, then 
the infimum is reached in L'. (Taking for example c = <I>{g, f), Q is necessarily not empty 
because we always have ^(gj-,f) < ^(g, /))• Moreover, when the O-divergence is greater than 
the distance, the very definition of the L? space enables us to provide the same proof as for 
the L' distance. □ 

Proof of lemma iRTl 

lemme F.7. For any p < d, we have f^^ = fa^ - see Ruber's analytic method -, ^i^"'' - ga,, - 
see Ruber's synthetic method - and — ga^ - see our algorithm. 

Proof. As it is equivalent to prove either our algorithm or Huber's, we will only develop here the 
proof for our algorithm. Assuming, without any loss of generality, that the a,, / = 1, are the 
vectors of the canonical basis, since g'^^^^Hx) - g(x) ^' Y'\ '^^'f^l --- \ we derive immediately 

that g^p - gp. We note that it is sufficient to operate a change in basis on the a, to obtain the 
general case. □ 

Proof of lemma EH 

lemme F.8. If there exits p, p < d, such that (S>(g^''\ f) = 0, then the family of{ai)i=\^,,p - derived 
from the construction ofg^''^ - is free and orthogonal. 

Proof. Without any loss of generality, let us assume that p - 2 and that the a, are the vectors 
of the canonical basis. Using a reductio ad absurdum with the hypotheses oi - (1,0, 0) and 
that fl2 = (a, 0, 0), where a e R, we get g^'^Kx) - g{x2, ■■, Xd I x\)f\{x\) and / = g^^\x) - 

g{x2, .., Xdlxx)fx{xi) [l°)]^"lax,) - Hence /(X2, = g{xz, .., x^/xi) [^"'^^"^J^^y 

It consequently implies that faaSflXi) - [^*'']ci;ai(Q'-^i) since 

1 = ^f(x2,..,xdlx,)dx2...dxd = ^gix2,..,Xdlx,)dx2...dxdj^^^^ = W&^y 

Therefore, g'^^ - g^^\ i.e. p - \ which leads to a contradiction. Hence, the family is free. 

Moreover, using a reductio ad absurdum we get the orthogonality. Indeed, we have 

J f(x)dx - 1 +00 - J n(aj^^x, a^x)h(ajx, ajx)dx. The use of the same argument as in 

the proof of lemma IFTSI enables us to infer the orthogonality of (a/)/=i,..,p. □ 

Proof of lemma IF.9[ 

lemme F.9. If there exits p, p < d, such that ^{g^''\f) — 0, where g^^^^ is built from the free and 

orthogonal family ai,...,aj, then, there exists a free and orthogonal family (bk)k=j+i,...,d of vectors 

q/Mf, such that g^P\x) — g{b~^^^x, ...,b^x/ajx, a~^ x)fa^iaj x)...fa .(aj x) 

, ± 
and such thatM. — Vectia,} © Vect{bic}. 

Proof. Through the incomplete basis theorem and similarly as in lemma lRSi we obtain the result 
thanks to the Fubini's theorem. □ 

Proof of lemma EM 

2 

lemme F.IO. For any continuous density f, we have y,„ — \fm(x) — f(x)\ — Op(m~^). 



26 



Defining b„(x) as bJx) = \E{ fJx)) - f(x)\, we have y,„ < \f„,(x) - E(f„(x))\ + bjx). More- 
over, from page 150 of lScod (ll992D . we derive that bm(x) - (9p(E^^j/i^) where hj - Op(m^^). 

2 1 

Then, we obtain bm(x) - Op(m~i^). Finally, since the central limit theorem rate is (9p(m 2 ), we 
infer thaty^ < Op(m^^) + (9p(m"ra) = Op(m^^). □ 
Proof of proposition |3j3l Proposition l3.3l comes immediately from proposition lBn pagel20land 
lemma|Cir]page|20l □ 
Proof of proposition |3j3) Let us first show by induction the following assertion 

Pik) - allows a deconvolution g^''^ - f * (p) 
Initialisation : For A: = 0, we get the result since g - g*"' is elliptic. 
Going from A: to A; + 1 : Let us assume P{k) is true, we then show that Vik + 1). 
Since the family of a,, / < A: + 1 is free - see lemma iFSl - then, we define B as the basis of R'' such 
that its A: + 1 first vectors are the a,, / < A; + 1 - see the incomplete basis theorem for its existence. 
Thus, in B and using the same procedure to prove lemma |FT]page|24l we have 
'g^'^\x) = Consequently, the very definition of the convolution product, the 

Fubini's theorem and the hypothesis made on the Elliptical family imply that 
^W(x) = g^%/xt^i)gfl^(xt^i) with g^%/xt^i) = t%/x,^i) * Ed-i(0,cr^l_d-i,^d-i) and with 
gfliixk+i) = 'si^liixk+i) * £'1(0,0-2,^1). Finally, replacing gfl^ with fy+i = Z^+i * Ei{0,cr^,^i), 
we conclude this induction with - g^''\./ Xk+i)fii+i{xii+i). 

Now, let us consider i/r (rep. tfr, i/f^*', ^*') the characteristic function of / (resp. /, g'^''^)- We 
then have ifris) = li/(s)'^(j(r^\s\^) and i/z^'^Hs) = (A**V)*P(5cr^kP)- Hence, i// and i//^''^ are less or 
equal to *P(^cr2|i|2) which is integrable by hypothesis, i.e. tfr and t//^'^^ are absolutely integrable. 
We then obtain gW(x) = (In)-'' J il/''''\s)e-''" "^ds and fix) = {In)-'' J i]/{s)e-''^ ""ds. 
Moreover, since the sequence (1^**^') uniformly converges and since i/r and i/r**^' are less or equal to 
*!'( jcr^lip), then the dominated convergence theorem implies that 

lim^ \f{x) - g«(jc)| < {In)-'' J lim^ \i//(s) - i/rW(.?)|c/i = a.s. i.e. Hm^ supjf(x) - g«(jc)| = a.s. 
Finally, since, by hypothesis, (In)-'' J \i(/{s) - il/''''\s)\ds < 2{2n)-''' J^'(jo-^\s\^)ds < co, then the 
above limit and the dominated convergence theorem imply that lim^; J \f(x) - g'-''\x)\dx - 0. □ 
Proof of corollary [331 Through the dominated convergence theorem and through theorem [34l 
we get the result using a reductio ad absurdum. □ 
Proof of lemma IF.llI 

lemme F.ll. Let consider the sequence {ai) defined in ( 12.31 ) page^ 
We then have lim„ limj: Kig'^^ ^~{tyf > /«) = a.s. 

Proof. Trough the relationship (I2.3l l and through remark ID. II page |22] as well as the additive 
relation of proposition IdTTI we can say that < .. < K(g^°°\f) < .. < K{g^''\f) < .. < K(g,f), 
where - lim/t g'*' which is a density by construction. And through proposition IC.2I we 
obtain that K(g^'^\f) = 0, i.e. 

= Ki/'^Kf) <...< K(g('\ /)<...< K(g, /), (*). 
Moreover, let (g!,*')i be the sequence of densities such that glP is the kernel estimate of g^''\ Since 
we derive from remark lFTi pagelSSlan integrable upper bound of gf\ for all k, which is greater 



than / - see also the definition of (p in the proof of theorem [34l -. then the dominated convergence 
theorem implies that, for any k, lim„ K(gn \ f„) = K(g^''\ f), i.e., from a certain given rank no, we 
have < .. < K(g^r\fn) < < Kigf^f,) < .. < K(g„J„), (**). 
Consequently, through lemma lFT2l pagel28l there exists a k such that 

< .. < K(¥^^,f„) < .. < K(g^r\fn) < .. < KCi'Tk-vfn) ^ ■■ ^ K(g„,fn), (***) 
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where ^Pj^*^' is a density such that ^P^^"^' - limt g^^\ 

Finally, through the dominated convergence theorem and taking the limit as n in (***) we get 

= K(g^-\f) = lim„ K(g^r\fn) > lim„ K(¥;^^,f„) > 0. 
The dominated convergence theorem enables us to conclude: 

= lim„ K(¥^^, f„) = lim„ Hm, K(gi'\ f„). □ 

Proof of lemma EHl 

lemme F.12. With the notation of the proof of lemma W.l 1\ we have 

< .. < < .. < Kig'^Kfn) < .. < K{¥-^l^,f„) < .. < K(gn,f„), (***) 

Proof. First, as explained in section iDl we have K{f^\g) - K{f^''^^\g) = K{fljll^,ga^^^)- More- 
over, through remai-k|ET]page|22l we also derive that K{f^''\g) = K(g^''\f). Then, K(f!t,l, , ga^,, ) 
is the decreasing step of the relative entropies in (*) and leading to = K(g^°°\f). Similarly, 
the very construction of (**), implies that K(fl,''2i,n' 8an.,,n) is the decreasing step of the relative 
entropies in (**) and leading to K(g'"^\f„). 

Second, through the conclusion of the section |D] and lemma 14.2 of Huber's article, we obtain 
that -^^(/it+,,n'§iJA+i,;i) converges - in decreasing and in k - towards a positive function of n - that 
we will call 

Third, the convergence of (g**^)^: - see proposition IC. 21 - implies that, for any given «, the se- 



quence {K{g^lf', f„))ic is not finite. Then, through relationship (**), there exists a k such that 

0<K(g't'\fn)-K(g^r\fn)<^n. 

Thus, since Q K{Q,P) is l.s.c. - see propertv IA.2I page fTSl - relationship (**) implies 

Q 

Proof of theorem 13. H First, by the very definition of the kernel estimator gf ^ = g„ converges 
towards g. Moreover, the continuity of a i-> and a i-> ga^„ and proposition 13 . 31 implv that 
g'lP - gf^'^w converges towards g^''. Finally, since, for any k, gf^ = ^i* ''r^, we conclude by 

an immediat induction. □ 
Proof of theorem rC.2[ 

relationship dC.ll l. Let us consider ^ j - {-^^riiyjjT^ - \p^^y\~{d^- Since / and g are bounded, 
it is easy to prove that from a certain rank, we get, for any x given in R'' 

'Remark F.2. First, based on what we stated earlier, for any given x and from a certain rank, 
there is a constant R>0 independent from n, such that 
max( ,.,, n', -T X , nTTiT-n^) < R = R(x) = 0(1). 

Second, since a^ is an M— estimator of a^, its convergence rate is (9p(n^'^^). 

Thus using simple functions, we infer an upper and lower bound for faj and for fa^ We 
therefore reach the following conclusion: 



< Op(n-"^). (F3) 



We finally obtain 



, fdia'i^x) , fa ia^x) , fa ia^x) , fd idjx) [e<-''-'>]„ .{a"!"x) 

ittA: •'"j^ J ' _ yrk •'^j^ j ' i _ rrfc -"^j^ j ' i-rrfc J^j^ ] ' j -* _ -i i 



fa- 
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Based on relationship (IF.3b . the expression mj-o] .^^a^^x) — / .(aix) tends towards 1 at a rate 



_i/9 L fd-iai^x) [g'-'-^\\a^. X) 

of (9p(n ' ) for all j. Consequently, n^=i [gO-'"')] ..(d '^x) — / .(J^.J) — tends towards 1 at a rate of 
OpirT^^^). Thus from a certain rank, we get 

in* 1 r-cMirC^T ^ - , , J = (9p(n-i/2)(9 (1) ^ (9p(„-i/2). 

In conclusion, we obtain = g(x)\n)^, x) - [gc^^x) I ^ Cp(«-'/2). 

relationship (jO). The relationship ICT] of theorem IC2l implies that - 1| = C»p(«"'^^) 
because, for any given x, g*^-'(x)| ^(n^'^j - 1| = - Consequently, there exists a 

smooth function C of M'' in such that 

lim„^co n^'^^CW = and |||^)^ - 1| < n-^'^Cix), for any x. 

We then have / - gW(;c)|c/x = / g^''Hx)\^^ - l\dx < J g^''\x)C{x)n-^^^dx. 

Moreover, sup,,^. - g(«(x)| = sup,,^. g<*>(x)|f5| - 1| 

- supj.£][jd g'^''\x)C{x)rr^l^ — > U.S., by theorem FCll 
This implies that sup^^jj^ g**-'(x)C(x) < oo a.s., i.e. sup^.^j^^ C(x) < oo a.s. since g**' has been 
assumed to be positive and bounded - see remark lFTI 

Thus, / gW(x)C(x)t/x < sup C. / g^''Hx)dx - sup C < oo since g*^*^^ is a density, therefore we can 
conclude / - g^''Hx)\dx < sup Cn^'^^ ^ (9p(«-'/2)_ □ 

relationship (IC.3) . We have 

with the line before last being derived from theorem lA. 1 I pageflSland where ip : x t-^ xln(x) - x+ 1 
is a convex function and where S > 0. We get the same expression as the one found in our Proof 
of Relationship (IC.2b section, we then obtain K(g^''\f) - K(g^''\f) < Ofin'^'^). Similarly, we 
get K(g^''\ f) - K(g^''\ f) < Op{n-^l^). We can therefore conclude. □ 
Proof of lemma IF.13I 

lemme F.13. We keep the notations introduced in Appendix\B\ It holds n = 0(mi). 
Proof. Let be the random variable such that 

N = il{/,„(Xj)>e„„ g(i',)>e,„|- Since the events {f,„(Xj) > 0,„) and {g(Yj) > 0,,,} are independent 
from one another and since {giYj) > 9,,,} c {gm{Yj) > -y„, + 9m], we can say that 

n = m.V{f„{Xj) > 0,n, g{Yj) > 9 J < m.P(/„(Xj) > 0„,)-Pfem(Fy) > -Jm + 0m). 
Consequently, let us study P(/,„(X,) > 9m). Let i^i)i=i...m be the sequence such that, for any / 

and any x in R'', = nf^^ ^^-^g-J^T)' - / ul^^^^e'^-^'^^' f{x)dx. Hence, for any 

given j and conditionally to X\, . . . , Xj^i, Xj+u . . . , X,„, the variables i^i(Xj))'*^^ are i.i.d. and 
centered, have same second moment, and are such that 

^ KiJinkh, + KiJi;^, = l.ilnrd/^ni^hj' since sup^e^-'-^ < 1. 

Moreover, noting that = ^,I.';U^i(x) + (271)-""^ ^I.T=iKA' I e'^-'-'^^' f(x)dx, 

we have fm{Xj) > 0,„ o J^SJl^^K^,) + (InT'/^-l^'Ll^Ul^hj' /g-^*"^'' f(x)dx > e„ 

« ^KM^j) ^ (Sn, - (2nr'"'i^liKjh' fe-'^'"^'' f(x)dx - }MXj))^ 

with ^j{Xj) = 0. Then, defining t (resp. e) as f = 2.{2TiT'^I^Y¥l^.^hJ^ (resp. 

e^{9m- (27r)-''^2n^/^^/^-il2m^]-[rf ^ J f(x)^x)^)^ the Rennet's inequality - lOevrove 
lfl98l page 160- imphes that P(;;^27_, ^i(Xj) > s/Xu Xj^u Xj^u . . . , X,„) < 2.exp(-^^^). 
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Finally, since the X, are i.i.d. and since / ( / nf^jg ^( >- f(x)dx)f(y)dy < 1, then the law of 



large numbers implies that ^'^'Jli Jll'l^^e 



f(x)dx 



UK 



f(x)f(y)dxdy 



a.s. Consequently, since < v < - see remark IbTI - and since e ' < x 2 when x > 0, we 
obtain, after calculation, that, from a certain rank, expi- ^'^^^j'^ ) = 0(m^^), i.e., from a certain 
rank, P(fm{Yj) > 0,„) - 0(nr^). Similarly, we infer P(g(F/) > = 0{m^^). In conclusion, 
we can say that n — m.P(fm(Xj) > 6m).P(g,„(Yj) > 9,„) = 0(mi). Similarly, we derive the same 
result as above for any step of our method. □ 

Proof of theorem l3.2l First, from lemma IfToI we derive that, for any x, 

suPfleR'' \fa.n{a x)-fa{a x)\ = Op(n Then, let us consider *Py = aU,.^ , — (FTTtVt' we have 

fly, II J «j J 

i.e. |>P;| = C>p(«-ji''='-rai^>i) since /„,(fljx) = 0(1) and ^^^""(fljx) = 0(1). We can therefore 
conclude similarly as in theorem rC.2l □ 
Proof of theorem |331 We get ththeorem through theorem lC3] and proposition lB.il □ 
Proof of theorem FCJI First of all, let us remark that hypotheses (//I ) to (H3) imply that % and 
Cnicik) converge towards in probability. 

Hypothesis {HA) enables us to derive under the integrable sign after calculation, 

P|M(fl,,fl,) = P|;M(fl,,fl,) = 0, 



> d- 

dajdbj ^ 



and consequently P-^^^M(ak, at) 



■ /got <)bj fg„i 



f dx. 



aa:db 



■M(ak,ak) 



M(ak,ak), which implies. 



-M(ak, Ok) - P-rr^Miak, at) 



= P 



M{ak, au), = P-^M{ak, flj) + P 



dajdbj 



dbjda 



■M{ak,ak). 
P„|-M(/7,fl) = 



P„|;M(/7(fl),fl) = 
\§M(cn(ak),7n)^0{E0) 
\£M(c„(ak),yn) = Q{El) ' 



The very definition of the estimators y„ and c„{ak), implies that 

P„|M(c„(fl,),y„) = 
\i;M{c„(ak),fn) + P„-§j;M(c„{ak),f„)£c„{ak) = 0, 
Under {H5) and {H6), and using a Taylor development of the (£0) (resp. (£1)) equation, we 
infer there exists (c„, -y„) (resp. (c„, %)) on the interval [(cn(ak), y„), (fli, ak)] such that 
-P,4M(flt,flt) = [(P^^M(ak,ak)V +o-p(l),(P^kM(ak,ak)V +op(l)]a„. 
(resp. -P„£M(ak,ak) = [(P^M(flt, a^))^ + op(l), (P|^M(at, a*))^ + op(l)]fl„) 
with a„ = ({c„(ak) - OkV , (y„ - OkV). Thus we get 



Pj§sM(ak,ak) 



P-r^^M(ak,ak) P-I^M(ak,ak) 



-1 r 



¥„^M{ak,ak) 
¥,£M(ak,ak) 



+ Op(l) 



y/rr(PSu;M(ak,ak)SrK(8^^.nr' 



V^^M{ak,ak) + ^K{gfj^,f) P^^M(ak,ak) 



P^M(ak,ak) 



P-ighM(ak,ak) 
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M(ak,ak) 
-P„lM(ak,ak) 



•Op(l) 



Moreover, the central limit theorem implies: V„-^M(ak,ak) A/i/(0,P||^M(aj;,fl<.)|p), 

P„£M(fli,fli,) Nd(0,niM(ak,ak)f), since P|M(fl*,fl*) = P£M(fl*,fl*) = 0, which leads 
us to the result. □ 
Proof of proposition IC.2I Let us consider i// (resp. i/r**') the characteristic function of / (resp. 
g^'^~^^). Let also consider the sequence (a,) defined in i2.3i page|6] 

We have \ip(t) - (A^*'(OI < / \.f(x) - g^''Hx)\dx < K{g^''\ /). As explained in section 14 of Huber's 
article and through remark |DT| page |22] as well as through the additive relation of proposition 
ID. II we can say that limA- ^(g(*~') ^f) - 0. Consequently, we get lim^ g*^** - f. 

Proof of theorem l3.4l We recall that gjf' is the kernel estimator of g^^K Since the relative entropy 
is greater than the L' -distance, we then have 

hm„ Hmi K(gi^\f„) > lim„ Mm/, J \gi^\x) - f„(x)\dx 
Moreover, the Fatou's lemma implies that 

Um, / IgfW - Mx)\dx > J limt [|^Jf\x) - Mx)\]dx = / |[lim,gf(x)] - f„(x)\dx 
and hm„ J \[limk g';^\x)] - f„(x)\dx > J lim„ [\[limi, g'-„^\x)] - f„(x)\]dx 

= / |[lim„lim^g*(.x)] - lim„ f„(x)\dx. 
Trough lemma IF. 11 1 we then obtain that = lim„ limk K{g^J^\ f„) > J \[lim„limit gl!^\x)] - 
lim„ f„{x)\dx > 0, i.e. that j \ [lim„ lim^^ gf\x)] - lim„ f„(x)\dx - 0. 

Moreover, for any given k and any given n, the function gf^ is a convex combination of multi- 
variate Gaussian distributions. As derived at remark IZTI of page|5] for all k, the determinant of 
the covariance of the random vector - with density g*^** - is greater than or equal to the product of 
a positive constant times the determinant of the covariance of the random vector with density /. 
The form of the kernel estimate therefore implies that there exists an integrable function if such 
that, for any given k and any given n, we have \gf^\ < ip. 

Finally, the dominated convergence theorem enables us to say that lim„ lim^ g^*' - lim„ /„ = /, 
since /„ converges towards / and since J | [lim„ limjt gjf' (x)] - lim„ f„{x)\dx = 0. □ 
Proof of theorem [C.4I Through a Taylor development of P„M(c„(flA:), Tn) of rank 2, we get at 
point {uk, ail): 

P„M(c„(fli), f„) = P„M(ak, at) + fnj-^Miak, a*)(f« - OkY + P„^M(at, flt)(c„(a^) - atY 
+ \{{yn - aky^n^M(ak,ak){y„ - a^) + (Cn(ak) - akYf„-^M{ak,ak)(yn - at) 
+(fn - akyT„-^M{ak,ak){c„(ak) - au) + (c„(ak) - akyT„-j^M{ak,ak)(Cn(ak) - au)] 

The lemma below enables us to conclude. 

lemme F.14. Let H be an integrable function and let C — ^ H dP and C„ — J H dP„, 
then, Cn-C ^ '^P^"^)- 

Thus we get P„M(c„(flt), f„) = P„M(fli,, a^) + Op(^), 
i.e. Vn(P«Af(c„(flA),f«) - PM(ak,ak)) = ^/Ji(¥„M(ak,ak) - PM(ak,ak)) + op(l). 
Hence Vn(P„M(c„(ai), -)>«) - PM{ak, aa)) abides by the same limit distribution as 
Vn(P«M(fli:, fli) - PM(ak, au)), which is NiQ, Varp(M(ak, ak))). □ 
Proof of theorem l3.5l Through proposition IB . 1 1 and theorem FC. 41 we derive theorem l33] . □ 
Proof of theorem I3l7l We immediately get the proof from theorem [l4l □ 
Proof of theorem l3l8l Since <l)(g^'\/) = 0, then, through lemma lF9l we deduct that the density 
ofbJX/aJX, with oi = (0, 1)' and b2 = (1,0)', is the same as the one of b^Y/ ajY . 
Hence, we derive that E(Xi/X2) = E{Yi/Y2) and also that the regression between Xi and X2 is 
Xi - E(Yi) + ^y'^J^Yi) ^ (Y2 - E(Y2)) + s, where e is a centered random variable such that it is 
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orthogonal to E{Xx /X2). □ 
Proof of theorem l3.9[ We infer this proof similarly to the proof of theorem [378] section. □ 
Proof of corollary 13.41 Assuming first that the bk and the a,- are the canonical basis of Mf'. 
Then, for any / j, F, is independent from Yj, i.e. EiY/t/Yi, Yj) - E{Yk). Consequently, the 
regression between and {X\ , ■.■,Xj) is given by X^ - E(Yk) + S/t where e is a centered random 
variable such that it is orthogonal to E{XtlX\, Xj). 

At present, we derive the general case thanks to the methodology used in the proof of lemma lFTI 
section with the transformation matrix B - (oi , cij, bj+i, bd)- n 
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